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ABSTRACT 

Large-scale and deep sky survey missions are rapidly collecting a large amount of stellar 
spectra, which necessitate the estimation of atmospheric parameters directly from spectra and 
makes it feasible to statistically investigate latent principles in a large dataset. We present 
a technique for estimating parameters T^tt, log g and [Fe/H] from stellar spectra. With this 
technique, we first extract features from stellar spectra using the LASSO algorithm; then, the 
parameters are estimated from the extracted features using the SVR. On a subsample of 20 000 
stellar spectra from SDSS with reference parameters provided by SDSS/SEGUE Pipeline SSPP, 
estimation consistency are 0.007458 dex for log T^ft (101.609921 K for Teff), 0.189557 dex for log g 
and 0.182060 for [Fe/H], where the consistency is evaluated by mean absolute error. Prominent 
characteristics of the proposed scheme are sparseness, locality, and physical interpretability. In 
this work, every spectrum consists of 3821 fluxes, and 10, 19, and 14 typical wavelength positions 
are detected respectively for estimating Teff, log g and [Fe/Hj. It is shown that the positions are 
related to typical lines of stellar spectra. This characteristic is important in investigating physical 
indications from analysis results. Then, stellar spectra can be described by the individual fluxes 
on the detected positions (PD) or local integration of fluxes near them (LI). The abovementioned 
consistency is the result based on features described by LI. If features are described by PD, 
consistency are 0.009092 dex for log Tgff (124.545075 K for Teff), 0.198928 dex for log g, and 
0.206814 dex for [Fe/H]. 

Subject headings: stars: atmospheres - stars: fundamental parameters - methods: statistical - methods: 
data analysis - stars: abundances 


1. Introduction 


Large-scale and deep sky survey mis sions, such 
as the Sloan Digital S ky Survey (SDSS; [York et ^ 


I 2 OOOI : lAhn et ^ 2012li . the Large Sky Area Multi- 
Object Fiber Spectrosc opic Telescope (LAMOST/ 


Guos houjing Telescope; IZhao et aLll2006t ICui et al 


l2012ll . and the Global Ast rometric Interferome¬ 
ter for Astrophys ics (GAIA; IPerrvman et al. 2001 


iLobel et al. 2011 1. are collecting and will obtain a 
large number of stellar spectra. To achieve scientific 
goals and make full use of the potential values of 


the observations, it is necessary to estimate the at¬ 
mospheric parameters (e.g. Tgff, log g and [Fe/H]) 
directly from the spectrum and statistically investi¬ 
gate latent principles in the large spectral dataset. 


This paper investigates the representation prob¬ 
lem of stellar spectra for physical parameter esti¬ 
mation, which is a vital procedure in the aforemen¬ 
tioned tasks and usually called feature extraction in 
data mining, machine learning, and pattern recog¬ 
nition. For example, in physical parameter esti¬ 
mation, a spectru m can be represented by the ob- 
served spectrum ( Bailer-Jone; 


mresent 

A l200nl: 


Shkedv et al 
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2007 ). corrected spectrum ( Prieto et alJ 20061) . de¬ 


scrip t ion of some typical lines (jMuirhead et al 


2012 Misheniim^taL _ 200^ , statistical description 
(iRe Fiorentin et 2007ll . etc. Feature extraction 
determines the applicable range of a data analysis 
system, accuracy, efficiency, physical interpretability, 
and robustness to noise and distortion from calibra¬ 
tion error. 

We propose a feature extraction scheme based 
on the LASSO (least abs olute shrinkage a nd selec¬ 
tion operator) algorithm ( Tibshirani 19961) for stel¬ 
lar spectra. The fundamental idea of this proposed 
scheme is to statistically detect typical wavelength 
positions statistically that are significant/necessary 
for discriminating stellar spectra with different at¬ 
mospheric physical parameters. In this study, the 
proposed scheme successfully detects 10, 19, and 
14 typical wavelength positions from 3 821 sample 
points 0 respectively for estimating atmospheric pa¬ 
rameter Teff, log g and [Fe/H]. In other words, a 
spectrum can be described by 10, 19, or 14 of the 
3 821 observed fluxes at the detected positions, or 
the local integrations of fluxes around the specific 
positions. It is shown that the detected positions 
are closely related with some spectral lines. In con¬ 
trast, the globa l met hod Principal Component Anal¬ 
ysis (PCA) ( 012012 ). which computes every feature 
from nearly all observed fluxes, locality makes the 
proposed scheme immune or robust to the aggre¬ 
gated influence of noise and calibration distortion. 
Therefore, prominent characteristics of the proposed 
scheme are sparseness and locality, based on that 
it is easier to backtrack the specific effective factors 
in estimating an atmospheric parameter than with 
global methods. 

To evaluate the effectiveness of the detected 
features, we investigate the atmospheric parame¬ 
ter estimation problem based o n the Support Vec¬ 
tor Regression (SVR ) method (jSmola et al.l 12004 ; 
Schdkopf et ahlliooil ) and the detected features. Ex¬ 


perimental results show excellent consistency be¬ 
tween the estimates of our proposed scheme and 
that provided by SPSS /SEGUE Spectroscopic Pa¬ 
ramet er Pipeline (SSPP:lBeers et al.ll2006l: Leeet_al 


2008alfi3: Prieto et al.l 120081 : Smolinski et al. 2011 


Lee et al. l201ll) on a subsample of 20 000 stellar 


spectra from SPSS. The SSPP of SLOAN estimates 


^By ‘3 821 sample points’, we mean that every spectrum is 
described by 3821 fluxes in this study. 


the fundamental stellar parameters based on both 
stellar spec tra and upriz p hotometry by multiple 
techniques ( Lee et al. 2008al) and a robust decision 
tree scheme. Performance of the SSPP were also in¬ 
vestigated from multiple aspects (IPrieto et al.l 12008 ; 
Lee et afl 2008bl : Smolinski et al. 12011 ). 


The proposed scheme is also evaluated on syn¬ 
thetic stellar spectra with ground-truth parame¬ 
ters. The synthetic spectra are computed based 
on the New Grids of ATLAS9 Model Atmospheres 
( Castelli et al. 20031) . On the synthetic spectra, the 
accuracy of the proposed scheme are 0.000801 dex 
for log Teff , 0.017881 dex for log g and 0.013142 
for [Fe/H], where the accuracy is evaluated by mean 
absolute error (MAE). 


The rest of this paper is organized as follows. 
We describe the stellar spectra used in this study 
and the previously estimated physical parameters for 
reference in section [2| In section [31 we introduce 
our proposed feature extracting scheme and analyze 
the extracted features. The parameterization model 
of stellar spectra and evaluation methods for accu¬ 
racy/consistency are introduced in section [H In sec¬ 
tion ini we propose our feature description schemes 
and present the parameterizing results. In section 
m compactness of the detected features are evalu¬ 
ated. In section|3 we evaluated the proposed scheme 
on synthetic spectra and discussed the configuration 
problem of the scheme. To highlight the character¬ 
istics of our proposed scheme, related research is re¬ 
viewed and analyzed in section [51 Finally, we sum¬ 
marize this work in section 0 


2. Data 


In this work, we use 50 000 stellar spect ra of 
SDSS/SEGUE observation ( Yannv et al. 20091) and 
their previously computed physical parameters from 
the S eventh Sloan Data Release ( Abazaiian et al.l 
I 2 OO 9 I) . The selected spectra span the ranges [4088, 
9740] K in effective temperature Tgff, [1.015000, 
4.998000] dex in surface gravity log g, and [-3.497000, 
0.268000] dex in metallicity [Fe/H]; additional sta¬ 
tistical information on the selected spectra is pre¬ 
sented in Fig. [T] and Fig. |3J All of the stellar 
spectra are shifted to their rest frames (zero radial 
velocity) based on the previously estimated radial 
velocity provided by the SSPP and rebinned to a 
maximal common log (wavelength) range [3.581862, 
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reff{K) r«ff(K) log5(dex) 

(a) Teff and log g (b) Tgff and [Fe/H] (c) log g and [Fe/H] 

Fig. 1.— Scatter diagram of the atmospheric parameters of the selected spectra. 



Fig. 2.— Distribution of the atmospheric parameters of the selected spectra. 


3.96396ljl with a sampling step 0.0001. 

Our proposed scheme belongs to the statistical 
learning method. The fundamental idea of this 
scheme is to discover the potentially predictive rela¬ 
tionship based on empirical stellar spectra and corre¬ 
sponding atmospheric parameters, which are called 
training data. At the same time, performance of 
the discovered predictive relationships should also 
be evaluated objectively. Therefore, a separate set 
of stellar spectra is needed for evaluation, usually 
called a test set in pattern recognition. On the other 
hand, most learning methods tend to overfit the em¬ 
pirical data. That is to say, the statistical learn¬ 
ing methods can discover some alleged relationships 
from the training data that do not hold in general. 
In order to avoid overfitting, we need some indepen¬ 
dent spectra for optimizing the parameters that need 
to be adjusted objectively in investigating the po¬ 
tential relationships, and these independent spectra 
and their reference parameters constitute a valida¬ 
tion set. Therefore, the selected stellar spectra are 
partitioned into three subsets: training set, valida¬ 
tion set, and test set. Sizes of the three subsets are 
20 000, 20 000 and 10 000 respectively. The roles of 


^Approximately, the common wavelength range is [3818.23, 
9203.67]!. 


the three subsets are presented in Table [TJ 

In the training and evaluation process based on 
SDSS spectra, we take the previously estimated at- 


Spectroscopic Parameter Pipeline (SSPP; Beers et al.l 

2006 

: Lee et alJl2008a 

bl: Prieto et alJl2008l: Smolinski et al. 

2011 

: Lee et al. 2011 

) as a reference. The SSPP 


of SLOAN estimates the fundamental stellar at¬ 
mospheric parameters based on both stellar spec- 
tra and ugriz ph otometry by multiple techniques 


(ILee et a l.ll2008a [), for example, s pectral fitting with 


k24 ( Allende Prieto et al.l 12012) and kil3 Girds, 
extended WBG method (( Wilhelm et al. I Il999l: 


Lee et alll2008a ]) based on theoretical ugr colors and 


line parameters from synthetic spectra, nonlinear 
neural network models t rained by real SPSS spe c- 
tra or synthetic spectra (( Re Fiorentin et n]l2007l) L 


the minimization technique based on synthetic 
spectral libraries NGSl and NGS2 girds. Sensi¬ 
tive wavelength window selection methods G8(GaIl) 
and M8(CaIIKl) based on the synthetic NGSl 
gird. Gall K and autocorrelati on function metho ds 
(GaIIK2, GaIIK3, and ACF (iBeers et al.l ll999l) L 
M12 method based on Ca II Triplet lines, GaI2 and 
MgH methods based on the Cal (4227 A), Mgib and 
MgH features, etc. The SSPP make the final deci- 
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sion by adaptively evaluating the reliabilities of the 
multiple estimates of every atmospheric parameter 
from a stellar spectrum and computing the weighted 
average of the reliable estimates. By doing this, lim¬ 
itations of a specific technique can be alleviated to a 
certain degree, for example, the restricted applicabil¬ 
ity from the coverage of the grids of utilized synthetic 
spectra, the methods used for spectral matching, 
and their sensitivity to the signal-noise ratio of a 
spectrum, the applicable range in parameter space, 
etc. The SSPP were validated by comparing its esti¬ 
mates with the sets of parameters obtained from the 
high-resolution spe ctra from SDSS-I/SEGUE stars 
( Prieto et al. 2008ll . and with the available informa¬ 
tion from the liter ature for stars in Galactic open and 


globu lar clusters (|Lee et al.l l2008bt ISmolinski et al 


20111) . Therefore, consistency between estimates of 
a proposed method and the SSPP results can reflects 
the performance of a method to a certain extent. 

The proposed scheme were also evaluated on syn¬ 
thetic spectra with ground-truth parameters. The 
synthetic spectra and the experiments are intro¬ 
duced in section [Ll 


3. Feature Extraction 


We investigate the feature ex traction problem 
by using the LA SSO algorithm (ITibshiranil Il996 : 
Efron et al.l 2004^ for automatically estimating at¬ 
mospheric parameters from stellar spectra. Suppose 
the training set is represented by 


Str = {{x\y,),i = 1,2 ,--- ,7V}, 


( 1 ) 


Table 1: Roles of three data sets. 


Data sets 

Roles 

Training set 

Be used in 


1) detecting features (Section!^ : 

2) estimating preprocessing param¬ 
eters (equation ©), {cfj} 

(equation ((9j); 

3) parameterizing model (Section[^. 

Validation set 

Be used in 


1) estimating feature description pa¬ 
rameter k in equations ||25ll. (I26ll 
and (|27l>: 

2) feature evaluation Sz refinement 
('Section!^. 

Testing set 

Be used in performance evaluation (Sec¬ 
tion |4.2t. 


where x* = (x\, • • • , is an observed spectra and 
Ui is the corresponding atmospheric parameteid, x* is 
a specific observed flux, and N is the size of training 
data set (in this study, = 20 000). Let (x, y) repre¬ 
sents a general stellar spectrum and its correspond¬ 
ing atmospheric parameter in consideration, where 

X = (xi, • • • ,Xp)^. (2) 

The validation set and testing set can be represented 
similarity by Syai and Ste- 


3.1. Preprocessing 

In feature analyzing, we conduct the following 
preprocessing procedures: 


• Replace Teff with log Tgff to reduce the dy¬ 
namical range and to bette r represent the un¬ 
certainties of spectral data! Re Fiorentin et al.l 


2003). 


• Normalize the features by setting every vari¬ 
able with zero mean and unit variance, which 
helps to put all of the variables on an equal 
footing. That is to say, the spectrum in equa¬ 
tion m is transformed into 


X = 


(xi, ■ • • , Xp) 


(3) 


and the training set in equation CD is trans¬ 
formed into 


S[y = {ix\y,),i = l,2,--- ,N}, (4) 


®In this paper, yi can be effective temperature, surface gravity, 
or metallicity. The stellar spectra are analyzed three times 
respectively for the three parameters. 


Table 2: Detected typical positions for estimating 
Teff from SDSS stellar spectra. TPW A’": Typical 
position in wavelength, TPL A^: Typical position in 
log(wavelength), TP: typical position. 


label 

TPW A™ (A) 

TPL A' 

lines near TP 

T1 

3840.2721 

3.5844 

Fe I 

T2 

3936.0626 

3.5951 

KP.Ca UK 

T3 

3936.9690 

3.5952 

KP,Ca UK 

T4 

3969.7394 

3.5988 

Ca IIHKp.Heps 

T5 

4341.7219 

3.6377 


T6 

4680.1740 

3.6703 

CC12 

T7 

5182.7708 

3.7146 

MgH-hMgl 

T8 

6569.9490 

3.8176 

Uc ,CaH 

T9 

9148.7551 

3.9614 

Fe 1,0 I 

TIO 

9150.8619 

3.9615 

Fe 1,0 I 
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where 


f* = (il,--- 


Xi = 


Xj - fj.j 


refer to the position of the second feature of Tgff in 

(5) wavelength and in log (wavelength) respectively (Ta- 
bleO , A “3 and A^g represent the position of the third 

(6) feature of log g in wavelength and log (wavelength) 
respectively (Table [3]): 




Mi = 


x^ 

N ’ 



(7) 

A^2 = 3936.0626 A, 

(12) 

(8) 

A(p 2 = 3.5951 dex, 

(13) 

A^g = 3839.3880 1, 

(14) 


A^g = 3.5843 dex. 

(15) 


(9) 


The validation set and testing set are preprocessed 
similarily by equation © based on the parameters fij 
in equation (|S]) and aj in equation ® , and converted 
into Syai and Ste- 

There are multiple statistical procedures that will 
be performed in this paper. To be readable, a 
flowchart is presented in Fig. |3]to demonstrate the 
end-to-end flow in the analysis. 

3.2. Detect Features 

In the LASSO scheme, features are identified by 
the following model 

(d,/3) = arg min{S(lg(yj - a- (10) 




subject to 




(11) 


> 0 is a preset parameter, a and /3 = 

-t-vcw o /■vF/^v’C’ Vv/*» -i-v+• 1 w* 1 rz/-I Jn 


,/3p)^ are parameters to be optimized, 
only a few f3j 


where t 

(/3i, 

the model m, only a few fUj will be nonzero and 
the wavelength positions of the corresponding Xj or 
Xj with nonzero /Sj are exactly the detected positions 
of spectral features. In this model, the parameter t 
controls the sparsity of the solution. The sparsity 
refers to number of detected features. In this work, 
the p arameter t is est i mated by 10-fold cross valida¬ 
tion ( Tibshirani 1996t Siostrand 2005ll . 

Detected features are presented in Fig. |4]visually, 
and their specific wavelength positions are listed in 
Table [21 Table |3l and Table ID A specific feature can 
be referred to by its label, position in wavelength, 
or in log (wavelength). For example, Ay 2 and Xlp 2 


To facilitate finding the characteristics of the de¬ 
tected features, we also show the features by some 
close-range views in Fig. |5l Fig. |6l and Fig. |7l 
In this work, spectral features are extracted by the 
following two procedures: 1) detect the positions of 
spectral features where the spectral fluxes have some 
variance with the parameter in theory; 2) describe 
the features based on one or several fluxes near the 
detected positions (SectionjS]). To highlight the vari¬ 
ance of fluxes at one specific detected position, we 
sometimes use term ‘feature’ instead of ‘position’ or 
‘descriptor’ (Fig. [H Fig. [SI Fig. [51 and Fig. [T]). The 
variance is closely related to the discriminability of 
a spectrum, and is essential for a good feature. 

The proposed feature extracting technique has the 
following advantages: 

• Interpretability The detected features all 
have specific wavelength positions, based on 
which we can backtrack the specific effective 
factors (Fig. |5l Fig. |6| and Fig. |7|) and evalu¬ 
ate their contributions to estimating the atmo¬ 
spheric parameters from stellar spectra (Sec¬ 
tion El). For example, is a sensitive line 
to surface temperature (T5 in Table [2] and 
Fig. |5(b)[ ); Ca II, H I, Hs, and Ca I are sensi¬ 
tive to surface gravity (L9 & LIO in Tabled 
Fig. |6(b)| and Fig. 6(c)[ ); Ha and Ca H are sen¬ 
sitive to both surface temperature and gravity 
(T8 in table [2] and Fig. |5(e)[ L18 in table |3 
and Fig. j6(g) ); Ca II line (L19 in tableland 
Fig. 6(h)( FI2 in table|4]and Fig. |7(e) I is an ef¬ 
fective factor for both surface gravity and s tel- 


lar metal abundance ( Cenarro et al. II200I ). 


• Efficiency Very few features are detected for 
every parameter estimation problem, and ev¬ 
ery feature can be described by, at most, 17 
fluxes near the detected wavelength position 
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Fig. 3.— A flowchart to show the order that the statistical procedures are used in analyzing. 


(Section [5]). Therefore, it is very efficient to 
compute the features and estimate the atmo¬ 
spheric parameters from stellar spectra based 
on this scheme. For example, only 10 features 
need to be computed to estimate Tgff, 19 fea¬ 
tures to estimate log g, and 14 features to es¬ 
timate [Fe/H]. More analysis on efficiency is 
presented in section HI 


• Good generalization In this study, ev¬ 
ery spectrum is described by 3 821 fluxes. 
LASSO can identify 10 local features to esti¬ 
mate Tgff, 19 features to estimate log g, 14 
features to estimate [Fe/H], and the parame¬ 
terization results are excellent comparing with 
the similar studies in litera tures (Section [5] 
and ( Re Fiorentin et al1l2007l L. Therefore, the 
proposed scheme enhances the generalization 
performance by rejecting redundancy, which 
usually cannot improve the performance of the 
estimating system except introducing distur¬ 
bances and overfitting. 


• High robustness The commonly used 
method PCA is of a global scheme. In PCA, 
every feature is computed from nearly all or 
most of the observed fluxes. This contributes 
to accumulation of the negative influence from 
noise, observation error, and calibration dis¬ 
tortion. Our proposed method can determine 
the specific positions of effective features and 
obtain their descriptions only from one or sev¬ 
eral observed fluxes near the detected positions 
(SectionjS]). Therefore, this scheme is more ro¬ 
bust or immune to the aforementioned undue 
influences in theory, and this also is validated 
by the excellent performance on SDSS/SEGUE 


spectra. 


4. Non-linear Regression Model for Atmo¬ 
spheric Parameter Estimation and Evalu¬ 
ation Scheme 

Let 

XF = {x^i),--- (16) 

represents a stellar spectrum, x, in equation © 
based on the features detected in section IH where 
g > 0 is the number of extracted features. Based on 
the spectral features, the training set in equation (HD 
can be denoted by 

5,^ = {(i^,y,)A = l,2,--- ,iV}, (17) 

where = (x^[, • • • , x^)^. Similarily, based on the 
extracted features, the validation set and test set can 
be denoted by S'//; and S^. 


4.1. Estimation model for atmospheric pa¬ 
rameters 


We utiliz e the Support Vector Regression (SVR) 
algorithnfl dSmola et al. 2004 : Schokopf et al. 2002) 
to estimate the mapping between stellar spectra and 
atmospheric parameters. The SVR estimation can 
be described by 


i 

/(xf) = ^ amfc(i)r,iF) + & (18) 

m—1 


'^Support Vector Machine (SVM) is a learning algorithm that 
can be used for classification and regression. To be unambigu¬ 
ous, it is denoted by Support Vector Classification (SVC) and 
Support Vector Regression (SVR) in scenarios of recognition 
and estimation respectively. 
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[Fe/H] 


Fig. 4.— Detected features for estimating the atmospheric parameters from SDSS stellar spectra. Black 
curves are stellar spectra with different parameters, red stars mark the positions of the detected features, and 
vertical dashed lines are to help us observe the representativeness of the detected features. The horizontal 
axis and vertical axis represent wavelength (A) and flux respectively. 



Fig. 5.— Close-range observations of the detected features for estimating T^tf. 
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Fig. 6.— Close-range observations of the detected features for estimating log g. 



(a) (b) (c) 



Fig. 7.— Close-range observations of the detected features for estimating [Fe/H]. 






































Table 3: Detected typical positions for estimating 
log g from stellar spectra. TPW A“: Typical po¬ 
sition in wavelength, TPL Ab Typical position in 


log(wavelength), TP: typical position. 


label 

TPW (A) 

TPL A' 

lines near TP 

LI 

3832.3221 

3.5835 

Mg I,Fe I, He I,Na I 

L2 

3838.5040 

3.5842 

He I, Mg I,VI 

L3 

3839.3880 

3.5843 

Fe I, Fe V 

L4 

3870.4548 

3.5878 

H8 

L5 

3871.3461 

3.5879 

H8 

L6 

3932.4390 

3.5947 

KP,Ca IIK 

L7 

3936.0626 

3.5951 

KP,Ca IIK 

L8 

3936.9690 

3.5952 

KP,Ca IIK 

L9 

3970.6536 

3.5989 

Ca II,H I 

LIO 

4099.7937 

3.6128 

Hs,Cal 

Lll 

4179.8625 

3.6212 

VI 

L12 

4215.6253 

3.6249 

Cal 

L13 

4566.2743 

3.6596 

Ba 

L14 

5183.9643 

3.7147 

Mg I,Mg Ic 

L15 

5185.1581 

3.7148 

Mg I,Mg Ic 

L16 

5252.4509 

3.7204 

Fe H 

L17 

5783.1173 

3.7622 

Fe II, Fe I, O II,VI 

L18 

6566.9241 

3.8174 

Ha, Ca H 

L19 

8544.0150 

3.9317 

Ca II 


where is a kernel functior@ and = 

1 , • • ■ ,1} are some members of training spectra in 
equation (flTll (called support vectors in literature 
( VapnikHigg^ i. In SVR, the estimation model (ITOl) 
is learnt from the training set based on the 
structural risk minimization principle, which com¬ 
bines empirical error and model complexity evalua¬ 
tion. Extensive research shows that this model has 
excellent generalization capacity. A typical charac¬ 
teristic of SVR is that the set of support vectors 
usually consists of a small fraction of the training 
samples; therefore, the obtained model is very effi¬ 
cient, which is important for large data processing. 
In this work, we used t he implementation of SVR in 


Chang and Lin (l200lh . 


4.2. Evaluation methods 

Suppose = {(i’", ym), m = 1,2, • ■ • , M} is a 
test set. In this work, the performance of the pro¬ 
posed scheme is evaluated by Mean Absolute Error 
(MAE), and Standard Deviation (SD). They are de¬ 
fined as follows: 


1 “ 

MAE= — y 

M ^ 


(19) 


m—1 


Table 4: Detected typical positions for estimating 
[Fe/H] from SDSS stellar spectra. TPW A’": Typical 
position in wavelength, TPL Ab Typical position in 
log(wavelength), TP: typical position. 


label 

TPW A" (A) 

TPL A' 

lines near TP 

FI 

3833.2046 

3.5836 

O II,FI,Ca III,He I 

F2 

3834.0873 

3.5837 

Fel, OVI, NI,FeII 

F3 

3869.5637 

3.5877 

Fe I 

F4 

3932.4390 

3.5947 

KP, Ca IIK 

F5 

3933.3446 

3.5948 

KP, Ca IIK 

F6 

3966.9982 

3.5985 

Ca IIHKp,Heps 

F7 

3969.7394 

3.5988 

Ca IIHKp,Heps 

F8 

4021.2586 

3.6044 

He I 

F9 

4038.8898 

3.6063 

He I 

FIO 

4213.6844 

3.6247 

Ca I 

Fll 

5891.9900 

3.7703 

Na I,Na 

F12 

8544.0150 

3.9317 

Ca II,Ca Ila 

F13 

8959.0508 

3.9523 

Fe I,Fe H,Ne II 

F14 

8961.1140 

3.9524 

Fe I 


SD 




e)2, 


( 20 ) 


where Cm is the error/difference between the refer¬ 
ence value of stellar parameter and its estimation 


em = Vm - rn = I, • • • , M. (21) 

and e = jj X]m=i 

MAE and SD are all widely used in evaluat¬ 
ing performance of an estimation scheme. Each of 
two evaluation schemes focuses on different aspects 
of an estimation method. MAE measures the av¬ 
erage magnitude of the deviation by ignoring the 
sign/direction of error. SD shows how much vari¬ 
ation exists in an estimation error, and reflects the 
stability/robustness of an estimation scheme. A low 
SD indicates that the performance of the proposed 
estimation scheme is very stable; a high SD indicates 
that its performance is sensitive to a specific spec¬ 
trum to be processed. 


® Gaussian kernel is used in our experiments unless otherwise 
stated. 
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5. Feature Description and its Application 
in Atmospheric Parameter Estimation 


Suppose X and x are a spectrum in equation 
(ED and its preprocessed edition in equation ©, 
A* is a given position of a detected feature in 
log (wavelength). For ease of introduction to fea¬ 
ture description, we assume i(A*) represents the 
preprocessed flux of spectrum x at log(wavelength) 
position A^ 

In section [Sj we detect the positions of features 
from stellar spectra. A direct description of the fea¬ 
tures is just to pick up observed fluxes at the de¬ 
tected positions: 


Sfm) =i(-^Tm)> 1>--- ,10 (22) 

for Teff (Table [2), 

= I,"',19 (23) 

for log g (Table 131), and 

=i(^Fm), w = I,-" ,14 (24) 


for [Fe/H] (Table |4])- The labels A(p^, A^^ and 
defined in section |31 Experimental results 
based on this kind description are presented in Ta¬ 
ble m In this scheme, only 10 observed fluxes are 
picked up directly and used for estimating Teff, 19 
observed fluxes for estimating log g, and 14 observed 
fluxes for estimating [Fe/H]. Therefore, it is very ef¬ 
ficient to extract features in application. The per¬ 
formance of the proposed scheme is also excellent 
co mpared with a similar stu dy in literature (Table 2 
in ( Re Fiorentin et ^ 2007l l'l in which 50 PCA fea¬ 
tures were used, every feature was computed from 
approximately 2 000 observed fluxes, and MAE is 
0.0126 for log Teff, 0.3644 for log g and 0.1949 for 
[Fe/H]. More direct comparisions are presented in 
section [9l 

However, real spectra are inevitably corrupted by 
noise, which usually degrades accuracy. Therefore, 


Table 5: Consistency/Accuracy on test set with fea¬ 
tures described by the observed fluxes on the de¬ 
tected typical positions. 


evaluation method 

log Teff 

log g 

[Fe/H] 

mSe 

0.009092 

0.198928 

0.206814 

SD 

0.012978 

0.282752 

0.274245 


to further improve accuracy, we propose the follow¬ 
ing feature description method based on the local av¬ 
erage of preprocessed spectral fluxes in a local area 
around the detected positions: 

j=k 

^fm) ~ ^ ^i^Tm + j ^a). To = 1, • • ■ ,10 (25) 

j=-k 

for Teff (Table [2, 

j=k 

+ / X Al), m = 1, • • ■ , 19 (26) 

j=-k 

for log g (Table [2 , and 

j=k 

^(-^Fm + J X A\), TO = 1, • • • , 14 (27) 

j=-k 

for [Fe/H] (Table [4]), where fc > 0 is an integer repre¬ 
senting radius of integration, and is the sampling 
step of a spectrum whose value is 0.0001 in this work 
(Section [2 ■ For convenience, we name the two de¬ 
scribing methods Point Description (PD) and Local 
Integration (LI) respectively. 

The theoretical foundation of proposed feature de¬ 
scription method LI in equations (|25l) - (l26)l - and (l27ll 
is the law of large numbers (LLN) in probability the¬ 
ory. A preprocessed spectral flux a;(A^) consists of a 
theoretical-spectral component and a noise term 

s(A') = it/i(A')-k e(A'), (28) 

where it/i(A^) is a theoretical flux without contami¬ 
nation from noise at log (wavelength) A*, and e(A*) 
is noise at the corresponding position. Suppose 
{e(A'), A' e [3.581862,3.963961]} is a set of in¬ 
dependent and identically distributed random vari¬ 
ables drawn from distributions with zero mean and 
finite variances The LLN states that the aver- 
age of noises — - -jFPi- converges in prob¬ 

ability and almost surely to the expected value 0 
as fc —>■ oo, where A* < 3.963961 — k x 0.0001 and 
A* > 3.581862 + k x 0.0001. In other words, it 
says that the negative effect from noise diminishes 
toward zero with k increasing. Similarly, informa¬ 
tion from the theoretical fluxes in observed spectra is 
also erased gradually with k increasing in equations 
(EH), E2, and EZD- Therefore, the performance of 
parameter estimation increases at the beginning on 
the whole, and after the effect of erasing theoretical 
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fluxes overpower the effect of diminishing noise, the 
performance will degrade (Fig. |8]). In this work, we 
obtain the optimal k based on the performance of the 
proposed scheme on validation set. Optimized k are 
6 , 2 and 8 respectively for T^ff, log g, and [Fe/H]. 
Final results are presented in Table IH Fig. |9] and 
Fig. [ini It is shown that accuracy of the estimation 
based on the LI description is improved. 


In the proposed LI approach, all of the detected 
features share a common smoothing parameter k. 
In reality, yet, the detected features may be differ¬ 
ent from each other on scale. Therefore, two more 
deliberated schemes are to estimate an independent 
smoothing scale based on a validation set for every 
feature, or to determine the scales adaptively in de¬ 
tecting features, for example, the fused lasso method 
can detect the supporting interval for e very feature 
( Tibshirani et al. 20051 : Ye et al. 2011). 


6. Feature Evaluation and Refinement 

In this section we investigate the compactness of 
the detected features. By compactness, we mean to 
study whether there is any redundancy in the set of 
detected features and how to detect and refine the 
features if any redundancy exists. 

First, we introduce a measure to evaluate the sig¬ 
nificance/necessity of a feature. For ease of descrip¬ 
tion, we take the evaluation of the features in Table 
[2] for Teff as an example. The full set of the features 
can be denoted by 

^Teff = {Tj, 1 <j < Nreff}, (29) 

and Fxeff represents a subset of features by deleting 
Ti from Freff- 

neff=FTeff-m 

={Tj, 1 <j < Nreffd ^ i}, 

where iVye// represents the number of detected fea¬ 
tures for Teff in Tabled and i = 1, • • • ,NTeff- In 


Table 6: Accuracy/Consistency on test set with fea¬ 
tures described by local integral near the detected 
typical positions. Integral radii are 6 for log Teff, 2 
for log g and 8 for [Fe/H] respectively. 


evaluation method 

log Teff 

log g 

[Fe/H] 

MAE 

0.007458 

0.189557 

0.182060 

SD 

0.011189 

0.270496 

0.248504 


this work, Nxeff is 10. We propose to evaluate the 
significance of T by 


S{T,) = MAEm^ff) - MAEiFreff), (31) 


where MAE{F^^jj) and MAE{FTeff) represent the 
mean absolute error of the estimation of atmospheric 
parameter Teff based on features F^eff and Fxeff 
respectively. If a feature Ti is completely redundant, 
in theory the estimation performance should be un¬ 
affected after deleting it, and S{Ti) should be zero. 
On the other hand, if feature Ti is essential for es¬ 
timating atmospheric parameter Tgff, then the ac¬ 
curacy should noticeably deteriorate after deleting 
it. Therefore, the proposed measure S expresses the 
necessity of the detected features to parameter esti¬ 
mation. Evaluation results of the features in Table [2] 
are presented in the second column of Table [T] The 
features of log g and [Fe/H] (Table [3] and Table S]) 
can be evaluated similarly, and corresponding results 
are presented in the second column of Table [5] and 
Table [9] respectively. 

The magnitude of MAE varies from problem to 
problem; for example, in Table [5] and Table El the 
MAE of Teff is noticeably less than that of log g and 
[Fe/H]. This magnitude can determine the potential 
value of significance evaluation. Therefore, we intro¬ 
duce the following relative evaluation scheme 


S^{T,) 


MAE{F^^ff)-MAE{FTeff) 

MAEm^ff) 


(32) 


Similarily, Fiog g, F)og g, S(Ti), S’'(T,), F^pe/H], 
F^Fe/HY ^'^{Fi) can be defined for the features 

of log g and [Te/iL]. 

Corresponding results are presented in the third 
column of Table [7l Table El and Table El respectively. 
For convenience, we name the evaluation schemes 
S in equation (1^ and S'’' in equation (1321) as Sig- 
nificance(S) measure and Relative Significance (RS) 
measure respectively. The RS measure can be re¬ 
garded as a standardized variant of the S measure. 

The above evaluating results show that: 


1) In the detected features for Teff and [Fe/H], 
no sufficient evidence shows the existence of 
redundancy (TABLE [H TABLE El). 

2) The evaluating results in TABLE El show that 
there exist three redundant features — LI, L4 
and L7 — in the detected features for log g due 
to over learning with high probability. 
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Fig. 8.— Variation of Mean Absolute Error (MAE) on validation set with integral radius R. Subfigures (a), 
(b), and (c) show MAEs of the estimations with different integration radii on validation set respectively for 
log Teff, log g and [Ee/H], 



Fig. 9.— Consistency. We compare our estimation of log Tgff, log g and [Fe/H] with the corresponding 
reference values provided by SSPP of SLOAN on the test set. The horizontal axis and vertical axis are 
the reference parameters provided by SSPP of SLOAN and the estimation of our proposed method. In this 
experiment, features are described by the LI method. 



Fig. 10.— Discrepancy and bias of the estimation. We compare our estimation of log Teff, log g, and [Fe/H] 
with the corresponding reference value provided by SSPP of SLOAN on the test set. The horizontal axis 
is the difference between the reference parameter provided by SSPP of SLOAN and the estimation of our 
proposed method. The vertical axis is the estimated probability density of the difference on the test set, and 
the red curve is a fitting of the density by a Gaussian distribution with identical mean and variance. In this 
experiment, features are described using the LI method. 
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3) Available evidence shows that T3, F13 and F14 
are non-significant with high probability (TA¬ 
BLE [7l TABLE [9]). 


Table 7: Compactness of the detected features in Ta¬ 
ble [D for estimating from stellar spectra. 
and are the evaluate values of significance mea¬ 
sure and relative significance measure respectively on 
validation set. MAE are the mean absolute errors on 
test set based on features ^Teff- Items are sorted 
decreasingly based on . 


label 


si 

MAE 

T7 

0.001587 

0.2128 

0.009211 

T8 

0.000985 

0.1321 

0.008380 

T5 

0.000638 

0.0855 

0.008139 

T6 

0.000588 

0.0788 

0.007980 

T1 

0.000082 

0.0110 

0.007570 

T4 

0.000057 

0.0076 

0.007498 

T9 

0.000019 

0.0025 

0.007521 

TIO 

0.000018 

0.0024 

0.007517 

T2 

0.000010 

0.0013 

0.007466 

T3 

-0.000003 

-0.0004 

0.007461 


Table 8: Compactness of the detected features in Ta¬ 
ble [3] for estimating log g from stellar spectra. S^a 
and are the evaluate values of significance mea¬ 
sure and relative significance measure respectively 
on validation set. MAE are the mean absolute er¬ 
rors on test set based on features Items are 

sorted decreasingly based on . 


label 

S„a 

SI 

MAE 

L16 

0.003656 

0.0193 

0.193511 

L18 

0.003182 

0.0168 

0.192749 

L9 

0.002700 

0.0142 

0.192719 

L15 

0.001455 

0.0077 

0.190796 

L14 

0.001067 

0.0056 

0.190436 

LIT 

0.000948 

0.0050 

0.191023 

L13 

0.000884 

0.0047 

0.190412 

L8 

0.000752 

0.0040 

0.189936 

L6 

0.000675 

0.0036 

0.189539 

LIO 

0.000577 

0.0030 

0.189894 

Lll 

0.000560 

0.0030 

0.189815 

L12 

0.000548 

0.0029 

0.190194 

L19 

0.000530 

0.0028 

0.189892 

L5 

0.000314 

0.0017 

0.189607 

L3 

0.000200 

0.0011 

0.189695 

L2 

0.000013 

0.0001 

0.189978 

L4 

-0.000071 

-0.0004 

0.189228 

L7 

-0.000091 

-0.0005 

0.189375 

LI 

-0.000892 

-0.0047 

0.189230 


Overall, although the compactness of the detected 
features is excellent, there remains some redundancy 
and non-significant features. Fortunately, magni¬ 
tude of the relative significance S”' of the redundant 
and non-significant features is evidently smaller than 
that of others. Therefore, they can be detected by 
checking whether the relative evaluation value 5’' of 
a feature is smaller than a preset threshold, for ex¬ 
ample 0.001. 

In theory, the significance evaluation of every fea¬ 
ture should be non-negative. However, there exist 
both theoretical-spectral components and noise com¬ 
ponents in observed data (equation [55]). The effec¬ 
tiveness of a redundant or non-significant feature is 
usually relatively low and can be overpowered by the 
effect of noise with a certain probability. Therefore, 
sometimes we can find that some detected features 
have negative significance evaluation, as in the case 
of the L4 in Table |8| 

7. On configuration of the proposed scheme 
and evaluation on spectra with ground- 
truth 

7.1. Linearity v.s. nonlinearity 

LASSO is a method based on a linear model, de¬ 
tects features according to the degree of linear cor¬ 
relations between a response and predictors. It is in¬ 
tuitive to choose a linear method for estimating the 
atmospheric parameters from the detected features. 

On the other hand, it is also possible that there 
exist some non-linear relationships between a re¬ 
sponse and its predictors with high linear correla¬ 
tion. For example, suppose x = {xi,X 2 ) are two 
predictors, y is a response, if some observed sam¬ 
ples of {xi,X 2 ,y) are as following: (1,1,2), (2,2,4), 
(3,—3,5) and (4,4,8); it is evident that there exist 
some non-linear relationships between the two pre¬ 
dictors and the response, even though the linear cor¬ 
relations between xi and y, and X 2 and y are as 
high as 0.9968 and 0.4722 respectively based on the 
observations. The experimental results in Table |6| 
and Table do] indicates the existence of non-linear 
relationships between the detected features and the 
physical parameters. 
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7.2. On choosing of estimation method 


Table 9: Compactness of the detected features in Ta¬ 
ble H] for estimating [Fe/H] from stellar spectra. Si,a 


and SOjj are the evaluate values of significance mea¬ 
sure and relative significance measure respectively on 
validation set. MAE are the mean absolute errors on 
test set based on features .F[Ve/ff] ■ are sorted 

decreasingly based on 8]]^,. 

label 

S„a 

sL 

MAE 

F3 

0.008238 

0.0452 

0.191239 

F8 

0.004414 

0.0242 

0.186801 

FIO 

0.002754 

0.0151 

0.184995 

Fll 

0.002444 

0.0134 

0.186255 

F7 

0.001979 

0.0109 

0.184598 

F12 

0.001651 

0.0091 

0.183202 

F6 

0.001152 

0.0063 

0.183398 

F9 

0.001136 

0.0062 

0.183192 

F5 

0.000922 

0.0051 

0.183404 

F2 

0.000799 

0.0044 

0.182874 

F4 

0.000710 

0.0039 

0.183193 

FI 

0.000485 

0.0027 

0.182613 

F13 

-0.000157 

-0.0009 

0.182116 

F14 

-0.000078 

-0.0004 

0.182141 


Table 10: Performance (MAE) of two linear methods. 
Experimental configurations are same as the exper¬ 
iments in Table m OLS (Ordinary Least Squares): 
linear least squares regression, SVR(linear): Support 
Vector machine Regresion with a linear kernel. 


evaluation method 

f 

log g 

IFe/H) 

OLS 

0.036510 

0.301661 

0.360890 

SVR(linear) 

0.034152 

0.253363 

0.323512 


To capture the non-linearity in atmospheric pa¬ 
rameter estimation, we investivate four typical non¬ 
linear regression methods: FNN (Feedforward neural 
network, implemented by the neural network toolbox 
in Matlab 2011b), G AM (Generalized Additiv e Mod¬ 


els (smooth splines')( Hastie and Tibshirani 1990ll . 


implemented by the R package gam), M ARS (Mul¬ 


tivari ate Adaptive Regression Splines (IFriedman 
199111 . implemen ted by the R package mda), RF( 


Random Forest (Breiman 200 It Liaw and Wiener 


20021 11. Parameters of the estimation methods are 


choosed based on validation set. 


Related evaluation results are presented in Table 
[m and Table [6] It is shown that SVR is more appli¬ 
cable to this estimation problem. 


7.3. Evaluation on spectra with ground- 
truth 


The proposed scheme is also evaluated on 18 969 
synthetic spectra. The synthetic spectra are cal- 
culated from th e SPEGTRUM (v2.76) package 
( Gray et al.lllOOjl with the New Grids of ATLAS9 
Model Atmospheres ( Castelli et al.ll200'^ as the stel¬ 
lar atmosphere model. In generating the synthetic 
spectra, 830 828 atomic and molecular lines are used 
(contained in two files luke.lst and luke.nir.lst), and 
the used atomic and molecular data comes form 
file stdatom. dat, which inclu d es sol ar atomic abun¬ 
dances from iGrevesse et al.l (|l998ll . The SPEG¬ 
TRUM package and the three data files can all be 
downloaded from the websiteH 


Our grids of the synthetic stellar spectra span the 
parameter ranges [4000, 9750] K in Te// (45 values, 
step size lOOK between 4000K and 7500K and 250 
K between 7750K and 9750K), [I, 5] dex in log g (17 
values, step size 0.25 dex steps), and [-3.6, 0.3] dex 
in [Fe/H] (27 values, step size 0.2 between -3.6 dex 


®http: //stellar.phys.appstate.edu/spectrum/download.html. 


Table 11: Performance (MAE) of four nonlinear 
methods. Experimental configurations are same as 
the experiments in Table [SI 


evaluation method Tgff 

log g 

[Fe/H] 

FNN 

0.008980 

0.186014 

0.179565 

GAM 

0.008139 

0.245167 

0.245111 

MARS 

0.011335 

0.243147 

0.242703 

RF 

0.009478 

0.228717 

0.204248 


14 








































and -1, 0.1 between -1 dex and 0.3 dex). 


The synthetic stellar spectra are also partitioned 
into three subsets: training set, validation set, and 
test set. Sizes of the three subsets are 8 500, 1 969 
and 8 500 respectively. The training set are used 
for detecting features and computing the estimation 
model. Validation set and test set are used for opti¬ 
mization the parameters in SVR and evaluating the 
performance of the learned model respectively. 


The detected features from synthetic training set 
are presented in Table fT^ Table [T3l and Table ITdl In 
this experiment, we adjusted the threshold t by hand 
to detect approximately same amount of features 
as the corresponding experiments on SDSS spectra. 
Numbers of the detected features are 9 for estimating 
Te//, 19 for log g and 15 for [Fe/H]. Based on these 
feature s, the estimation re s ults a re presented Table 
[TKl In iRe Fiorentin et al. ( 2007tl . the best consis¬ 
tency on synthetic spectra are obtained based on 100 
principal components, and the MAE are 0.0030 dex 
for l ogT^ff, 0.0251 for log q, 0 0 269 for [Fe/H](Table 
1 in iRe Fiorentin et all ( 2007ll '). Therefore, apart 
from much less complexity in computing spectral fea¬ 
tures, the proposed scheme in this work is also more 
accurate than the scheme based on PC A. 


7.4. LASSO for spectral feature selection: 
feasibility, potential risks and alterna¬ 
tives 


Feature selection is to choose a subset of variables 
that collectively have a good predictive power. Ac¬ 
cording to the utilized evaluation metric on the pre¬ 
dictive power, feature selection algorithms can be 
divided into three c ategories: filters, wrappers and 
embedded methods l|Guvon and Elisseeff 11200311 . 

Wrappers measure the effectiveness of a subset of 
variables by the accuracy of a learning machine of in¬ 
terest (a regression model or a classifier). Every sub¬ 
set should be used to train a model of the selected 


Table 12: Detected typical positions for estimating 
Teff from synthetic stellar spectra. TPW A“: Typ¬ 
ical position in wavelength (A), TPL A*: Typical 


position in log(wavelength). 


index TPW 

TPL A‘ 

index TPW 

TPL A‘ 

1 

3933.3446 

3.5948 

2 

4036.1008 

3.6060 

3 

4221.4534 

3.6255 

4 

4475.7106 

3.6509 

5 

4501.5492 

3.6534 

6 

5753.8959 

3.7600 

7 

9 

6496.2391 

6547.2964 

3.8127 

3.8161 

8 

6545.7890 

3.8160 


Table 13: Detected typical positions for estimating 
log g from synthetic spectra. TPW A’": Typical po¬ 
sition in wavelength (A), TPL A*: Typical position 


in log (wavelength). 


index TPW A^ 

TPL A‘ 

index TPW A^ 

TPL A‘ 

1 

3835.8534 

3.5839 

2 

3889.2154 

3.5899 

3 

3933.3446 

3.5948 

4 

3969.7394 

3.5988 

5 

4101.6821 

3.6130 

6 

4856.9317 

3.6864 

7 

4858.0502 

3.6865 

8 

5183.9643 

3.7147 

9 

5240.3706 

3.7194 

10 

5276.6951 

3.7224 

11 

5316.9429 

3.7257 

12 

5321.8423 

3.7261 

13 

5323.0678 

3.7262 

14 

5336.5674 

3.7273 

15 

5368.6118 

3.7299 

16 

5589.3589 

3.7474 

17 

5657.9877 

3.7527 

18 

5891.9900 

3.7703 

19 

8467.6325 

3.9278 





Table 14: Detected typical positions for estimating 
[Fe/H] from synthetic spectra. TPW A’": Typical 
position in wavelength (A), TPL A*: Typical position 


in log (wavelength). 


index TPW A^ 

TPL A‘ 

index TPW A^ 

TPL A‘ 

1 

3933.3446 

3.5948 

2 

4340.7223 

3.6376 

3 

4871.4921 

3.6877 

4 

5176.8073 

3.7141 

5 

5183.9643 

3.7147 

6 

5275.4802 

3.7223 

7 

5279.1257 

3.7226 

8 

5287.6415 

3.7233 

9 

5304.7143 

3.7247 

10 

5316.9429 

3.7257 

11 

5475.9818 

3.7385 

12 

5527.9232 

3.7426 

13 

5588.0721 

3.7473 

14 

5615.1582 

3.7494 

15 

8542.0479 

3.9316 





Table 15: Performance on synthetic spectra based on 
SVR and features in Table [121 Table [13] and Table 
M Feature are described by the LI method with 
integration radii k = 6, 2, 8 repectively for Te//, 
log g and [Fe/H]. MAE are the mean absolute errors 
on synthetic test set. 


evaluation method 

Te/.f 

log 9 

[Fe/H] 

MAE 

0.000801 

0.017881 

0.013142 

SD 

0.001277 

0.071147 

0.036305 
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learning machine. The amount of possible combi¬ 
nation of variables increases exponentially with the 
number of observed variables. Therefore, wrappers 
are computationally intensive in case of a larger num¬ 
ber of observed variables. Filters perform feature se¬ 
lection by a measure independently of the learning 
machine of interest. This kind of methods is usu¬ 
ally less computationally intensive than wrappers, 
but the selected features are not tuned to a specific 
learning machine of interest. Embedded methods se¬ 
lect features in learning the model of interest and the 
features selected by this kind of methods are optimal 
to a specific learning machine. Due to the compu¬ 
tational feasibility problem, the optional models in 
embedded methods are limited, for example, a linear 
model. 


In this work, we investigated the feasibility of 
exploring the possible subsets of spectral features 
for estimating atmospheric parameters by LASSO. 
LASSO is a feature selection method based on a lin¬ 
ear model. However, experiments shows that there 
exist some non-linearity in the dependence of atmo¬ 
spheric parameters on observed spectral fluxes (Sec¬ 
tion O]). Therefore, the LASSO played a role of 
filters in this work and there exists the risk of miss¬ 
ing high relevant features to the estimating model 
of interest (SVR in this work). To reduce the pos¬ 
sibility of this risk, an optional scheme is to firstly 
select a larger subset of features by the large value 
t in inequality (TO, and then refine the features by 
a more computational embedded method on the se¬ 
lected s ubset, for example the recursive forward se¬ 


lection (|Lh^^nd^^hengJj2006 ) and backward elim¬ 


ination (IGuvon and Elisseeff 


200, Sh . Eor the spe¬ 


cific learning machine SVR, features can also be de- 
tected by a built-in SVM feature selection algorithms 
( Becker et al. I I 2 OO 9 I: IWeston et al. l[2000h . 


8. Related Research 

To highlight the characteristics of the proposed 
scheme, related research is reviewed and analyzed in 
this section. 

Due to the rapid development of spectrum- 
obtaining capability and the driven by demand, 
many attempts have been made to estimate the 
atmospheric parameters directly from spectra in lit¬ 
erature. In automatically estimating physical pa¬ 
rameters from a stellar spectrum, a key procedure is 
feature extraction, which determines the applicable 


range of the corresponding system, accuracy, effi¬ 
ciency, physical interpretability, and robustness to 
noise and distortion from calibration error. There¬ 
fore, we roughly classify related researches into three 
categories based on the feature-extracting methods 
used in them: line index method, template matching 
method, and the statistical index scheme. 


8.1. Line index method 


This kind of method is used to estimate atmo¬ 
spheric parameters by representing a stellar spec¬ 
trum with a description of typical lines, which is 
directly related to our knowledge about the stellar 
spectrum and astrophysics. A prominent charac¬ 
teristic of the line index method is physical inter¬ 
pretability. Therefore, this is a favorite method in 
spectrum analysis. 


For example, Muirhead et al. ( 20121 1 investigated 
the estimation problem of effective temperature Tgff 
and metallicity [M/H] for late-K and M-type planet- 
candidate host stars from the K-band spectra re¬ 
leased by the Kepler Mission based on three spec¬ 
tral indices: the equivalent widths of Nal (2.210 ^m) 
and Cal (2.260 /rm) lines, and an index describing 
the change in flux between three 0.02/rm wide bands 
centered at 2.245, 2.370, and 2.080 /rm respectively. 


Roias-Avala et al.l (|2012n further proposed a revised 


relationship that estimates metallicities [Fe/H] and 
[M/H] of M dwarfs bas e d on the three spectral in¬ 
dices. iMishenina et al. (2008) proposed a method 
to estimate effective temperature by line depth ra¬ 
tio, two methods to estimate surface gravity log g 
based on the ionization balance of iron and fitting 
of the wings of the Cal 6162.17A line. The funda¬ 
mental parameters of 66 B-type stars are determined 
by the equivalent widths and/or line profile shapes 
of continuum-nor malized hydrogen, he l ium, and sili¬ 
con line profiles in ( Lefever et al.l2010h . Posbic et al 


( 2012h developed a software to determine radial ve¬ 
locity Vr, effective temperature Teff) surface grav¬ 
ity log g, metallicity [Fe/H], and individual abun- 
dances by a schem e rel ying on line-by-li ne modeling. 
Lee et all (2008a) and IL uo et al. ( 2008h each took a 


line index method as a component in developing their 
atmospheric parameter estimation systems for stel¬ 
lar spectra from SDSS and LAMOST/Guoshoujing 
Telescope respectively. 


Despite the advantage of physical interpretability, 
the performance of this kind method depends on the 
reliability of detecting spectral lines and accuracy of 
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their description, which are usually sensiti ve to noise 


and calibration distortion in application (jHan et al 
201ll:lHanl[2nTl . 


8.2. Template matching method 

Suppose Si = {(a:%j/i), i = 1- ■ ■ N} is a library of 
templates and x is a stellar spectrum whose physical 
parameters y{x) need to be estimated, where a;* is a 
template spectrum and yi is the corresponding phys¬ 
ical parameter. If a;®° is the most similar template 
to x, then a basic implementation of the template 
matching method is to assign y{x) = The fun¬ 
damental idea of this method is simple and intuitive: 
give the estimated value with the parameter of the 
most similar template. 

Therefore, it is also widely investigated in atmo¬ 
spheric parameter estimation. Its basic steps are: 


• Construct a library of templates S'/; 

• Find k most similar template spectra in Si for 
a spectrum x whose physical parameter y{x) 
needs to be estimated, where fc is a preset pos¬ 
itive integer; 

• Estimate y{x) by fusing the parameters of k 
most similar template spectra. 


Key problems in this method are: 1) construction 
of the template library, which acts as a source or 
carrier of professional knowledge needed to param¬ 
eterize stellar spectra and is closely related to the 
accuracy of estimation and applicable range of the 
corresponding system; 2) similarity measure between 
two spectra, which embodies our understanding to 
the problem to be tackled and also is related to ac¬ 
curacy; 3) scheme to organize the spectral template 
library and find the k most similar template(s), as 
this scheme determines the efficiency of the system. 


Due to the importance of the construction of the 
template library, t his method has a ttracted con¬ 
siderable attention. Gray et al. ( 1994ll investigated 
the construction of synthetic ste llar spectra based 
on Kurucz models ( Kurucz 1992h and developed a 
publicly available program, SPECTRUM. Based on 
the SPECTRUM an d New grids of th e ATLAS9 


Mode l Atmosphere (IServen et al.l 1200511 . iDu et al 


( 2 OI 2 II synthesized a comprehensive set of 2 890 near- 
infrared spectrum library with resolution wavelength 
sampling similar to the SDSS and LAMOST, and pa¬ 
rameter ranges from 3 500 to 7 500K for effective 


temperature Tgff, from 0.5 to 5.0 dex for surface 
gravity log g , and from -4.0 to 0.5 dex for [Fe/H]. 
Heiter et al. ( 2002l l presented several sets of grids 


of model stellar atmospheres computed by modified 
versions of the ATLAS9 code with parameter range 
from 4 000 to 10 000 K for Tgff, from 2.0 to 5.0 dex 
for log q, and from -2.0 to 1.0 dex for metallicity 
[M/H]. ICustafsson et al. ( 2008ll developed and used 
a program MARCS, to construct late-type model at¬ 
mospheres and presented a gird of about 10^ model 
atmospheres for stars with parameter range from 
2 500K to 8 OOOK in Tgff, from -1 to 5 dex in log g 
and from -5 to -1-1 in [Me/H]. 

Based on the utilized evaluation scheme of similar¬ 
ity between spectra, the template matching method 
can be i mplemented in form s of the nearest neighbor 
method ( Zwitter_etaL 2005ll . the k-nearest neighbor 
method (ILiu et al. 20^ b the chi-square minimiza - 


tion method ( Jofre et all 120101 : IPrieto et al 


the correlation coefficient method (Liu et al 


2006), 


2OI2II 


etc. F or exa mple. iKatz et al.l (119981) . ISoubiran et al . 
( 2000ll . and Soubiran et al.l ( 2003 ) provided a soft¬ 
ware, TGMET, based on a reduced chi-square mini¬ 
mization scheme and investigated the problem to es- 
timate physic a l para meters Tgff, log g, and [Fe/H]. 
Shkedv et al. ( 20071) developed a method using a 


hierarchical Bayesian principle to estimate funda¬ 
mental stellar parameters and their associated un¬ 
certainties from the infrared 2.38-2.60 ym Infrared 
Space Observatory (ISO)-Short Wavelength Spec¬ 
trometer (SWS) spectral data; in this method, 
both systematic and statisti cal measu r ement errors 
were taken into account. iLiu et al.l (j2013r i com¬ 


prehensively compared the chi-square minimization 
method, k-nearest neighbor method, and correlation 
coefficient method in estimati ng atmosphe r ic pa - 
rameters from stellar spectra. iKoleva et al.l ( 2009h 
developed a full-spectrum fitting package, ULySS, 
and explored its application in parameterizing stel¬ 
lar spectra. Based on the ULySS, Wu et al. re- 
estimated the phy sical parameters for the CFLIB 
spectral database ( Wu et aP l2Qlla ). explored new 
Metal-poor Star Candidates from Guo Shoujing 
Telescope (LAM OST) Commissioning Observations 
( Wu et ai1l20in[) . and constructed a set of stellar 
spectral templates to estimate physical parame ters 


paramete 

U l2011bl) . 


from LAMOST stellar spectra (|Wu et al.l 

Apart from the advantages of simplicity and in¬ 
tuitiveness, the template matching method is essen¬ 
tially a global method that is sensitive to the accu- 
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mulation of noise, distortion, and calibration error. 
With this method, it is also difficult to analyze and 
evaluate the effectiveness of local features of spectra, 
or to resolve the physical interpretation of a phe¬ 
nomenon. 


8.3. Statistical index scheme 


To estimate atmospheric parameter from stel¬ 
lar spectra, the statistical index scheme is ded¬ 
icated to establishing a function mapping from 
spectral space0 to the space of physical parame¬ 
ters by treating all of the fluxes of a spectrum 
equally and as a whole. Common methods of 
this ki nd include Principal Component Analysis 
fPCAll Jolliffe 200^, Wavelet ( Mallat 20081) . Neural 
Network (|Bishod 19951) . etc. 


For example, me Fiorentin et al.l (|2007l ) first pro¬ 
jected a spectrum into a 50-dimensional PCA space 
and then estimated physical parameters by learn¬ 
ing a mapping from the PCA space to atmospheric 
parameter sp ace with a nonlinear feedforward neu¬ 
ral network. Zhang et al.l ( 2006h estimated atmo¬ 
spheric parameters by establishing a mapping from 
PCA space to parameter space by using a non- 
parametric estimator with variable window-width. 
iManteiga et al. ( 2010l ) parameterized stellar spectra 
by extracting features based on Fourier analysis and 
Wavelet decomposition, and constructing a mapping 
from a feature space to the parameter space by feed¬ 
forward networks with three layers. After extract¬ 
ing features by Haar wavelet, Lu et al. investigated 
the atmospheric parameterization problem by cap¬ 
turing the mapping to parameter sp a,ce based on the 
Support Vector Regression (SVR) ( Lu et akl 20131) 


and t he non-parameter regression method (|Lu et al 


20121 ) . 


On works based on Neural Network, iBailer- .Tones 
(2000 t ) investigated the estimating precision of stel¬ 
lar parameters T^a, log g, and [M/H] by a feed¬ 
forward non-linear network with two hidden layers 
on synthetic spectra with differen t resol ution and 
signal-to-noise ratio. ISnider et al.1 (|2nnil ) explored 
the application of back-propagation neural networks 
with one and two hidden layers in estimating atmo¬ 
spheric parameters from medium-resolution spectra 
of F- and G- type stars. By a back propagation neu¬ 
ral network, ICiridhar et al.1 ()2006[ ) studied the pa- 


stellar spectrum is regarded as a vector in a high dimen- 
sional space. 


rameterization of a set of stellar spectra from the 
2.3 m Vainu Bappu Telescop e at K avalur observa¬ 
tory, India. IWillemsen et al.l (j2005^ researched pa¬ 
rameterization of stellar spectra obtained at the VLT 
at ESO/Paranal (Chile) in visitor mode by using 
a feedforward neural network, which is trained on 
synthetic spec t ra usi ng the model atmospheres from 


Castelli et al. ( 19971) in combination with SPEC¬ 


TRUM ( Cray et al.l 19941) . In the aforementioned 
works based on Neural Networks, there is no explicit 
or separated procedure for extracting spectral fea¬ 
tures. Actually, the data stream moving layer by 
layer from input to output is an iterative feature ex¬ 
traction procedure. 


A prominent characteristic of this kind of method 
is that the parameterizing model of stellar spectra 
can be explored without need for prior of physical 
atmospheric model generating spectra; in machine 
leaning and artificial intelligence, methods with 
this characteristic are called black-box approaches. 
Therefore, the statistical index scheme is relatively 
easy to use. Furthermore, results of a method of this 
kind are obtained statistically from a lot of spectra, 
which usually result in good overall performance. 
Meanwhile, the existence of noise, distortion, and 
calibration error usually leads to incompleteness of 
the theoretical atmosphere model in reality, and we 
are obtaining stellar spectra on an unprecedented 
scale. Therefore, the statistical index scheme has 
a good potential usage in spectral parameterization 
and knowledge mining from a large set of spectra. Its 
limitations are the difficulty of resolving physical in¬ 
terpretations from the results of the statistical index 
scheme. This work investigated a novel statistical 
index scheme for stellar spectrum parameterization 
with good interpretability physically. 


9. Conclusion 

We propose a nonlinear scheme to automatically 
estimate three primary atmospheric physical param¬ 
eters, Teff, log 5 , and [Fe/H], from SDSS stellar 
spectra. This scheme is invoked by two sets of 
pre-parameterized stellar spectra, which act as two 
sources/carriers of professional knowledge needed to 
parameterize stellar spectra, and are called a train¬ 
ing set and a validation set respectively in pattern 
recognition and machine learning. Therefore, the 
proposed scheme is flexible and can be updated con¬ 
veniently by replacing the knowledge carriers with 
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two new ones to meet developing needs. 

The proposed model consists of the following 
five procedures: 1) Statistically detect typical wave¬ 
length positions of features from stellar spectra; 2) 
Compute the description of spectral features based 
on local information near the detected typical posi¬ 
tions; 3) Refine features by evaluating compactness 
of the extracted features; 4) Learn a parameterizing 
model by SVR algorithm based on training data; 5) 
Estimate physical parameters by the learned model 
and description of spectra. Procedures 1), 2), 3) 
and 4) are for constructing a stellar parameterizing 
model, procedures 2) and 5) are used in parameter¬ 
izing a new spectrum; procedure 3) is optional de¬ 
pending on our specific requirement for sparseness 
and accuracy. 


One prominent characteristic of the proposed 
scheme is sparseness and locality. In this work, 
for example, every observed spectrum consists of 
3 821 fluxes; our method detects 10, 19, and 14 
typical wavelength positions to estimate Teff, log g, 
and [Fe/H] respectively. Then, a stellar spectra can 
be described by a vector of 10, 19 and 14 compo¬ 
nents respectively for Tgff, log g, and [Fe/H]; this 
is a dramatic reduction of data compared to the 
original components number of 3 821 and the typ- 
i cal results with 50 comp onents of related research 
(|Re Fiorentin et al.l 120071) . Therefore, the features 
detected by this method are very sparse, which is 
closely related to computing efficiency of the pro¬ 
cessing system and physical interpretability of re¬ 
lated results. 


Another typical characteristic is locality. We pro¬ 
pose two methods to describe features. One is to 
use the observed fluxes at the detected typical po¬ 
sitions. That is to say, we can just pick up 10, 
19, and 14 fluxes at the detected wavelength posi¬ 
tions from 3 821 fluxes as features to estimate Teff, 
log g, and [Fe/H] respectively. The second method 
is to accumulate the nearest 13, 5, or 17 fluxes at 
every detected position respectively for Teff, log g, 
and [Fe/H]. Based on the second method, to com¬ 
pute the features for Tgff, the needed computation 
is just 120 plus operation. On the contrary, if we ex¬ 
tract 10 features by the traditional Principal Com¬ 
ponent Analysis (PCA) method, the computations 
are approximately 38 210 product operations and 
38 200 plus operations for nearly every flux in a spec¬ 
trum. Therefore, the proposed scheme is relatively 
very efficient. Furthermore, because the proposed 


method only uses the fluxes near the detected posi¬ 
tions, it is more immune and robust to aggregation 
of noise and distortion from calibration error. For 
convenience, we name the aforementioned describing 
methods Point Description (PD) and Local Integra¬ 
tion (LI) respectively. 

Accuracies/Consistencies of our proposed scheme 
with respect to the pre-estimation by SSPP of SDSS 
are 0.007458 dex for log (101.609921 K for 
0.189557 dex for log g, and 0.182060 for [Fe/H] if fea¬ 
tures are described by the LI method, where the ac¬ 
curacy is evaluated by mean absolute error (MAE). 
If features are described by the PD method, the ac¬ 
curacies are 0.009092 dex for log Tgff (124.545075 K 
for Tgff), 0.198928 dex for log q , and 0.206814 dex 
for [F e/H]. In similar scenario, iRe Fiorentin et al 


( 2007l l investigated the stellar parameter estima¬ 
tion problem and obtained accuracies 0.0126 dex for 
log Teff, 0.3644 dex for log g dex and 0.1949 dex 
for [Fe/ H] on a test set (1 9 000 stellar spectra from 
SDSSl: Dofre et ah ( 2010li applied MAy method to 
a sample of 17 274 metal-poor dwarf stars from 
SDSS/SEGUE and estimated the metallicity with 
averaged accuracies of 0.24 dex, the temperature 
with 130 K and log g with 0.5 dex: IXin et al. I ( 2013h 
proposed a scheme to parameterizing stellar spec¬ 
tra based on line index and artificial neural network, 
where the accuracies are 147.8123 K for log Tgff, 
0.24757 dex for log g dex and 0.19942 dex for [Fe/H] 
on 9043 spectra from SDSS. Therefore, compared to 
the results of related works in similar scenario, the 
performance of this scheme is excellent. 


We also investigate the compactness of the de¬ 
tected features and introduce two concepts: Signifi¬ 
cance measure S (in equation ([3T])) and Relative Sig¬ 
nificance measure 5’’" (in equation (15^ 1 for this pur¬ 
pose. By compactness, we mean to study whether 
there is redundancy in the detected features and, if 
any redundancy exist, how much. Research shows 
that we can refine the features by the measures S 
and S'^ on a validation set if there exists a demand 
for a more compact feature set. 
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